A Joint Source-Channel Model for Machine Transliteration
نویسندگان
چکیده
Most foreign names are transliterated into Chinese, Japanese or Korean with approximate phonetic equivalents. The transliteration is usually achieved through intermediate phonemic mapping. This paper presents a new framework that allows direct orthographical mapping (DOM) between two different languages, through a joint source-channel model, also called n-gram transliteration model (TM). With the n-gram TM model, we automate the orthographic alignment process to derive the aligned transliteration units from a bilingual dictionary. The n-gram TM under the DOM framework greatly reduces system development effort and provides a quantum leap in improvement in transliteration accuracy over that of other state-of-the-art machine learning algorithms. The modeling framework is validated through several experiments for English-Chinese language pair.
منابع مشابه
Statistical Machine Transliteration with Multi-to-Multi Joint Source Channel Model
This paper describes DFKI’s participation in the NEWS2011 shared task on machine transliteration. Our primary system participated in the evaluation for English-Chinese and Chinese-English language pairs. We extended the joint sourcechannel model on the transliteration task into a multi-to-multi joint source-channel model, which allows alignments between substrings of arbitrary lengths in both s...
متن کاملA Modified Joint Source-Channel Model for Transliteration
Most machine transliteration systems transliterate out of vocabulary (OOV) words through intermediate phonemic mapping. A framework has been presented that allows direct orthographical mapping between two languages that are of different origins employing different alphabet sets. A modified joint source–channel model along with a number of alternatives have been proposed. Aligned transliteration...
متن کاملCombining a Two-step Conditional Random Field Model and a Joint Source Channel Model for Machine Transliteration
This paper describes our system for “NEWS 2009 Machine Transliteration Shared Task” (NEWS 2009). We only participated in the standard run, which is a direct orthographical mapping (DOP) between two languages without using any intermediate phonemic mapping. We propose a new two-step conditional random field (CRF) model for DOP machine transliteration, in which the first CRF segments a source wor...
متن کاملDirect Orthographical Mapping for Machine Transliteration
Machine transliteration/back-transliteration plays an important role in many multilingual speech and language applications. In this paper, a novel framework for machine transliteration/backtransliteration that allows us to carry out direct orthographical mapping (DOM) between two different languages is presented. Under this framework, a joint source-channel transliteration model, also called n-...
متن کاملJoint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration
In this paper, we first carry out an investigation on two existing pivot strategies for statistical machine transliteration, namely system-based and model-based strategies, to figure out the reason why the previous model-based strategy performs much worse than the system-based one. We then propose a joint alignment algorithm to optimize transliteration alignments jointly across source, pivot an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004